7 research outputs found
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Text-VQA aims at answering questions that require understanding the textual
cues in an image. Despite the great progress of existing Text-VQA methods,
their performance suffers from insufficient human-labeled question-answer (QA)
pairs. However, we observe that, in general, the scene text is not fully
exploited in the existing datasets -- only a small portion of the text in each
image participates in the annotated QA activities. This results in a huge waste
of useful information. To address this deficiency, we develop a new method to
generate high-quality and diverse QA pairs by explicitly utilizing the existing
rich text available in the scene context of each image. Specifically, we
propose, TAG, a text-aware visual question-answer generation architecture that
learns to produce meaningful, and accurate QA samples using a multimodal
transformer. The architecture exploits underexplored scene text information and
enhances scene understanding of Text-VQA models by combining the generated QA
pairs with the initial training data. Extensive experimental results on two
well-known Text-VQA benchmarks (TextVQA and ST-VQA) demonstrate that our
proposed TAG effectively enlarges the training data that helps improve the
Text-VQA performance without extra labeling effort. Moreover, our model
outperforms state-of-the-art approaches that are pre-trained with extra
large-scale data. Code is available at https://github.com/HenryJunW/TAG.Comment: BMVC 202
Challenges in Representation Learning: A report on three machine learning contests
The ICML 2013 Workshop on Challenges in Representation Learning focused on
three challenges: the black box learning challenge, the facial expression
recognition challenge, and the multimodal learning challenge. We describe the
datasets created for these challenges and summarize the results of the
competitions. We provide suggestions for organizers of future challenges and
some comments on what kind of knowledge can be gained from machine learning
competitions.Comment: 8 pages, 2 figure